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Abstract 

This paper considers the problem of guessing the reahzation of a finite alphabet source when some side information is provided. 
The only knowledge the guesser has about the source and the correlated side information is that the joint source is one among a 
family. A notion of redundancy is first defined and a new divergence quantity that measures this redundancy is identified. This 
divergence quantity shares the Pythagorean property with the Kullback-Leibler divergence. Good guessing strategies that minimize 
the supremum redundancy (over the family) are then identified. The min-sup value measures the richness of the uncertainty set. 
The min-sup redundancies for two examples - the families of discrete memoryless sources and finite-state arbitrarily varying 
sources - are then determined. 
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D : I. Introduction 

in . 

Let X be a random variable on a finite set X with probability mass function (PMF) given by {P{x) : x G X). Suppose 
' that we wish to guess the realization of this random variable X by asking questions of the form "Is X equal to x7", stepping 
1—1 through the elements of X, until the answer is "Yes" ([1], [2]). If we know the PMF P, the best strategy is to guess in the 
f-H decreasing order of P-probabilities. 

'"1 The aim of this paper is to identify good guessing strategies and analyze their performance when the PMF P is not completely 
^ known. Throughout this paper, we will assume that the only information available to the guesser is that the PMF of the source 
I— —I, is one among a family T of PMFs. 

By way of motivation, consider a crypto-system in which Alice wishes to send a secret message to Bob. The message is 
^ encrypted using a private key stream. Alice and Bob share this private key stream. The key stream is generated using a random 
and perhaps biased source. The cipher-text is transmitted through a public channel. Eve, the eavesdropper, guesses one key 
stream after another until she arrives at the correct message. Eve can guess any number of times, and she knows when she 
has guessed right. She might know this, for example, when she obtains a meaningful message. From Alice's and Bob's points 
of view, how good is their key stream generating source? In particular, what is the minimum expected number of guesses that 
Eve would need to get to the correct realization? From Eve's point of view, what is her best guessing strategy? These questions 
were answered by Arikan in [2] and generalized to systems with specified key rate by Merhav and Arikan in [3]. 

Taking this example a step further, suppose that Alice and Bob have access to a few sources. How can they utilize these 
Q sources to increase the expected number of guesses Eve will need? What is Eve's guessing strategy? We answer these questions 
^ in this paper 

When P is known, Massey [1] and Arikan [2] sought to lower bound the minimum expected number of guesses. For a given 
guessing strategy G, let G{x) denote the number of guesses required when X = x. The strategy that minimizes E [G(X)], the 
expected number of guesses, proceeds in the decreasing order of P-probabilities. Arikan [2] showed that the exponent of the 
minimum value, i.e., log [ming E [G(X)]], satisfies 

i^i/2(^)-log(l + ln|X|) <logLin E[G{X)]\ < H,/^{P), 

where Ha (P) is the Renyi entropy of order a > 0. Bozta§ [4] obtains a tighter upper bound. 

For p > 0, Arikan [2] also considered minimization of (E[G{X)P]y^^ over all guessing strategies G; the exponent of the 
minimum value satisfies 

-log(l + ln|X|) < ilog [minE[G(X)'']l < (1) 

p I G J 

where a = 1/(1 + p). 

Arikan [2] applied these results to a discrete memoryless source on X with letter probabilities given by the PMF P, and 
obtained that the minimum guessing moment, ming E [G{X'^)p], grows exponentially with n. The minimum growth rate of this 
quantity (after normalization by p) is given by the Renyi entropy Ha{P). This gave an operational significance for the Renyi 
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entropy. In particular, the minimum expected number of guesses grows exponentially with n and has a minimum growth rate of 
Hi/2{P)- The study of E [G{X)p], as a function of p, is motivated by the fact that it is the convex conjugate (Legendre-Fenchel 
transformation) of a function that characterizes the large deviations behavior of the number of guesses. See [3] for more details. 

Suppose now that the guesser only knows that the source belongs to a family T of PMFs. The uncertainty set may be finite 
or infinite in size. The guesser's strategy should not be tuned to any one particular PMF in T, but should be designed for the 
entire uncertainty set. The performance of such a guessing strategy on any particular source will not be better than the optimal 
strategy for that source. Indeed, for any source P, the exponent of E [G{X)''] is at least as large as that of the optimal strategy 
M[Gp{X)P], where Gp is the guessing strategy matched to P that guesses in the decreasing order of P-probabilities. Thus 
for any given strategy, and for any source P e T, we can define a notion of penalty or redundancy, R{P, G), given by 

R{P, G) = - logE [G(X)''] - - log E [Gp(X)^] , 
P P 

which represents the increase in the exponent of the guessing moment normalized by p. 

A natural means of measuring the effectiveness of a guessing strategy G on the family T is to find the worst redundancy 
over all sources in T. In this paper, we are interested in identifying the value of 

min sup R{P, G) , 

G pgT 

and in obtaining the G that attains this min-sup value. 

We first show that R{P,G) is bounded on either side in terms of a divergence quantity La{P,QG)', Qg is a PMF that 
depends on G, and La is a measure of dissimilarity between two PMFs. The above observation enables us to transform the 
min-sup problem above into another one of identifying 

inf sup La{P, Q)- 

Q PGT 

The role of La in guessing is similar to the role of Kullback-Leibler divergence in mismatched source compression. The 
parameter a is given by a = 1/(1 + p). The quantity La is such that the limiting value as a —> 1 is the Kullback-Leibler 
divergence. Furthermore, L^ shares the Pythagorean property with the Kullback-Leibler divergence [5]. The results of this 
paper thus generalize the "geometric" properties satisfied by the Kullback-Leibler divergence [5]. 

Consider the special case of guessing an n-string put out by a discrete memoryless source (DMS) with single letter alphabet 
A. The parameters of this DMS are unknown to the guesser Arikan and Merhav [6] proposed a "universal" guessing strategy 
for the family of DMSs on A. This universal guessing strategy asymptotically achieves the minimum growth exponent for all 
sources in the uncertainty set. Their strategy guesses in the increasing order of empirical entropy. In the language of this paper, 
their results imply that the normalized redundancy suffered by the aforementioned strategy is upper bounded by a positive 
sequence of real numbers that vanishes as n ^ cxd. One can interpret this fact as follows: the family of discrete memoryless 
sources is not "rich" enough; we have a universal guessing strategy that is asymptotically optimal. 

The redundancy quantities studied in this paper also arise in the study of mismatch in Campbell's minimum average 
exponential coding length problem. Campbell ([7] and [8]) identified a code that depended on knowledge of the source PMF. 
The code has redundancy within a constant of the optimal value and is analogous to the Shannon code for source compression. 
Blumer and McEliece [9] studied a modified Huffman algorithm for this problem and tightened the bounds on the redundancy. 
Fischer [10] addressed the problem in the context of mismatched source compression and identified the supremum average 
exponential coding length for a family of sources. In particular, he showed that the supremum value is the supremum of the 
Renyi entropies of the sources in the family. In contrast to Fischer's work, our focus in this paper is on identifying the worst 
redundancy suffered by a code. 

Most of the results obtained in this paper were inspired by similar results for mismatched and universal source compression 
([11], [12], [13]). We now highlight some comparisons between source compression and guessing. 

Suppose that the source outputs an n-string of bits. In lossless source compression, one can think of an encoding scheme 
as asking questions of the form, "Does X" e Ei?" where {Ei : z = 1, 2, • • •) is a carefully chosen sequence of subsets of X". 
More specifically, one can ask the questions "Is Xi = 0?", "Is X2 = 0?", and so on. The goal is to minimize the number of 
such questions one needs to ask (on the average) to get to the realization. The minimum expected number of questions one 
can hope to ask (on the average) is the Shannon entropy H{P). In the context of guessing, one can only test an entire string 
in one attempt, i.e., ask questions of the form "Is X" = x"?". The guessing moment grows exponentially with n and the 
minimum exponent, after scaling by p, is given by the Renyi entropy Ha{P). 

The quantity La plays the same role as Kullback-Leibler divergence does in mismatched source compression. La shares the 
Pythagorean property with the Kullback-Leibler divergence [14]. Moreover, the best guessing strategy is based on a PMF that 
is a mixture of sources in the uncertainty set, analogous to the source compression case. The min-sup value of redundancy for 
the problem of compression under source uncertainty is given by the capacity of a channel [12] with inputs corresponding to 
the indices of the uncertainty set, and channel transition probabiUties given by the various sources in the uncertainty set. We 
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show that a similar result holds for guessing under source uncertainty. In particular, the min-sup value is the channel capacity 
of order 1/a [15] of an appropriately defined channel. 

The following is an outline of the paper In Section HI] we review known results for the problem of guessing, introduce the 
relevant measures that quantify redundancy, and show the relationship between this redundancy and the divergence quantity La- 
in Section |in| we see how the same quantities arise in the context of Campbell's minimum average exponential coding length 
problem. In Section II VI we pose the min-sup problem of quantifying the worst-case redundancy and identify another inf-sup 
problem in terms La- In Section |V] we study the relations between La and other known divergence measures. In Section fVll 
we identify the so-called center and radius of an uncertainty set. In Section IVIII we specialize our results to two examples: 
the family of discrete memoryless sources on finite alphabets, and the family of finite-state arbitrarily varying sources. We 
establish results on the asymptotic redundancies of these two uncertainty sets. We further refine the redundancy upper bound 
for the family of binary memoryless sources. In Section I Villi we conduct a further study of La divergence and show that it 
satisfies the Pythagorean property. Section IIXI closes the paper with some concluding remarks. 



In this section, we prove previously known results in guessing. Our aim is to motivate the study of quantities that measure 
inaccuracy in guessing. In particular, we introduce a measure of divergence, and show how it is related to the a-divergence of 
Csiszar [15]. 

Let X and Y be finite alphabet sets. Consider a correlated pair of random variables {X, Y) with joint PMF P on X x Y. 
Given side information y = j/, we would like to guess the realization of X. Formally, a guessing list G with side information 
is a function G : X x Y ^ {1, 2, • • ■ , |X|} such that for each y G Y, the function G(-, y) : X ^ {1, 2, • • • , |X|} is a one-to-one 
function that denotes the order in which the elements of X will be guessed when the guesser observes Y — y. Naturally, 
knowing the PMF P, the best strategy which minimizes the expected number of guesses, given Y = y, is, to guess in the 
decreasing order of P(-, ?/)-probabilities. Let us denote such an order Gp. Due to lack of exact knowledge of P, suppose we 
guess in the decreasing order of probabilities of another PMF Q. This situation leads to mismatch. In this section, we analyze 
the performance of guessing strategies under mismatch. 

In some of the results we will have p > 0, and in others p > — 1,/9 ^ 0. The p > case is of primary interest in the 
context of guessing. The other case is also of interest in Campbell's average exponential coding length problem where similar 
quantities are involved. 

Following the proof in [2], we have the following simple result for guessing under mismatch. 

Proposition L {Guessing under mismatch) Let p > 0. Consider a source pair {X,Y) with PMF P. Let Q be another PMF 
with Supp((5) = X X Y. Let Gq be the guessing list with side information Y obtained under the assumption that the PMF is 
Q, with ties broken using an arbitrary but fixed rule. Then the guessing moment for the source with PMF P under Gq satisfies 



II. Inaccuracy and redundancy in guessing 



-\og{nGQ{X,YY]) 




(2) 



where the expectation E is with respect to P. 



□ 



Proof: For p > 0, for each yeY, observe that 



GQ{x,y) 



< 



Y,HQ{a,y)>Q{x,y)} 



< 




for each a; e X, which leads to the proposition. 



For a source P on X x Y, the conditional Renyi entropy of order a, with a > 0, is given by 




(3) 
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For the case when |Y| = 1, i.e., when there is no side information, we may think of P as simply a PMF on X. The above 
conditional Renyi entropy of order a is then the Renyi entropy of order a of the source P, given by 



(4) 



Note that the left-hand side of (|3j is written as a functional of P instead of the more common Ha {X \ F). We do not use 
the latter because the dependence on the PMF needs to be made explicit in many places in the sequel. Also note that both 
(13 and define Ha{P), one in the two random variable case, and the other in the single random variable case. The actual 
definition being referred to will be clear from the context. It is well-known that 



< HaiP) < log|X|. 



(5) 



Suppose that our guessing order is "matched" to the source, i.e., we guess according to the list Gp. We then get the following 
corollary. 



Corollary 2: (Matched guessing, Arikan [2]) Under the hypotheses in Proposition ^ the guessing strategy Gp satisfies 



-log{E[Gp{X,Yr])<Ha{P), 



(6) 

□ 



where a = 1/(1 + p). 

Proof: Set Q = P in Proposition ■ 
Let us now look at the converse direction. 

Proposition 3: (Converse) Let p > 0. Consider a source pair {X, Y) with PMF P. Let G be an arbitrary guessing list with 
side information Y. Then, there is a PMF Qc on X x Y with Supp{Qg) = X x Y, and 

-log{E[G{X,Yr]) 
P 



^ \yeYxex 
-log(l + ln|X|), 



E 

.aSX 



QG{x,y) 



(7) 

where the expectation E is with respect to P. □ 
Proof: The proof is very similar to that of [2, Theorem 1]. Observe that because p > 0, for each y eY, we have 

i+p in 



— C < CO. 



Define the PMF Qg as 



}G{x,y) 



1 



— , V(x,y)eXxY. 



f| cG{x,yY+P' 

Note that Supp((5g) = X x Y. Clearly, guessing in the decreasing order of Qc-probabilities leads to the guessing order G. 
By virtue of the definition of Qg, we have 



E 

.aex 



QG{a,y) \ 
QG{x,y) 



^5:p(x,y)G(x,yr.f5:-i 

yeYx(£X VaeX ^ ' 



y) 



,yeYx£X 



(8) 



where the last inequality follows from (as in [2]) 



|X| 



^ G[a,y) ^ 1 



aex 
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The proposition follows from ■ 

Observe the similarity of the terms in the right-hand sides of equations (|2ji and Q in Propositions ^ and |3] respectively. The 
analog of this term in mismatched source compression is — J2xex ^i'-^) Qi^)^ which is the expected length of a codebook 
built using a mismatched PMF Q. The Shannon inequality (see, for example, [16]) states that 

- J2 Pi'x) ^ogQix) >-J2 Pi^) log P{x) = H{P) 

x^'K x^X 

The next inequality is analogous to the Shannon inequality. We can interpret this as follows: if we guess according to some 
mismatched distribution, then the expected number of guesses can only be larger We will let a — 1/(1 + p) and expand the 
range of a to < a < oo. A special case (when no side information is available) was shown by Fischer (cf. [10, Theorem 
1.3]). 



Proposition 4: (Analog of Shannon inequality) Let a 



> 0, a 7^ 1. Then 

i+p ' ' 




with equality if and only if P = Q. 



(9) 

□ 



Proof: We will prove this directly using Holder's inequality. The right side of (|9jl is bounded. Without loss of generaUty, 
we may assume that the left side of (|9} is finite, for otherwise the inequality trivially holds and P ^ Q. We may therefore 
assume Supp(P) C Supp((3) under < a < 1, and Supp(P) n Supp((5) ^ under 1 < a < oo which are the conditions 
when the left side of (|9j is finite. 

With a = l/(l + /3), (|9}is equivalent to 



yi£Y xi 



E 

LaSX 



/ Q{a,y) 
\Q{x,y) 

i+p 



> sign(p) • ^ P(a;, y) lip 



The above inequality holds term by term for each y G Y, a fact that can be verified by using the Holder inequality 

A / s 1-A 



sign(A) • ■ u}j > sign(A) • u^^-"^^ 



(10) 



with A = p/{l + p) = l-a,u^ = Q{x, yf'^^+p\ 

v, = Pix,y)Qix,y)-p/^'+p\, 

and raising the resulting inequality to the power 1 + p > 0. From the condition for equality in (I10> . equality holds in (|9j if 
and only if P = Q. ■ 

Proposition |3 motivates us to define the following quantity that will be the focus of this paper: 

ia(P,Q) - 



Q{x,y) 



T^iog EE^(-^^) E 

- H^iP). (11) 

Proposition 0] indicates that La{P, Q) > 0, with equality if and only if P = Q. 

Just as Shannon inequaUty can be employed to show the converse part of the source coding theorem, we employ Proposition 
@]to get the converse part of a guessing theorem. We thus have a slightly different proof of [2, Theorem 1(a)]. 



Theorem 5: (Arikan's Guessing Theorem [2]) Let p > 0. Consider a source pair {X, Y) with PMF P. Let a 

H„(P)-log(l + ln|X|) 

< -\og(mmE\G(X,Y)P] 

p \ G 

< Ha{P). 



y4— . Then 
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□ 

Proof: It is easy to see that the minimum is attained when the guessing Ust is Gp, i.e., when guessing proceeds in the 
decreasing order of P-probabihties. Application of Proposition |3] with G ~ Gp and Proposition |4] with Q = Qgp yields the 
first inequality. The upper bound follows from Corollary |2] ■ 

Remarks: 1) Qgp may be different from P even though they lead to the same guessing order 

2) Theorem|5]gives an operational meaning to Ha{P); it indicates the exponent of the minimum guessing moment to within 

log(l + ln|X|). 

3) Loosely speaking, Proposition 0] indicates that mismatched guessing will perform worse than matched guessing. The 
looseness is due to the looseness of the bound in Theorem |5] 

Suppose now that we use an arbitrary guessing strategy G to guess X with side information Y, when the source (X, y)'s 
PMF is P. G may not necessarily be matched to the source, as would be the case when the source statistics is unknown. Let 
us define its redundancy in guessing X with side information Y when the source is P as follows: 

R{P, G)^- log (E [G{X, YY]) - - log (E [Gp{X, F)"]) (12) 
P P 

The dependence of R{P, G) on p is understood and suppressed. The following proposition bounds the redundancy on either 
side. 

Theorem 6: Let p > 0, a = 1/(1 + p). Consider a source pair {X,Y) with PMF P. Let G be an arbitrary guessing list 
with side information Y and Qg the associated PMF given by Proposition |3l Then 

\R{P, G) - L^{P, Qg)\ < log(l + In |X|). (13) 

□ 

Proof: The inequality R{P, G) < La{P, Qg) + log(l + In |X|) follows from Proposition [T] appUed with Q = Qg, the 
first inequality of Theorem |5j and il 1> . 

The inequality R{P, G) > La{P, Qg) ~ log(l + In |X|) follows from Proposition |3l the second inequality of Theorem [S] 
and ((nil- ■ 

Remark: It is possible that P and Q lead to the same guessing order, i.e., Gp = Gq. Thus R{P, Gp) = R{P, Gq) = 0. Yet, 
it is possible that La{P,Q) and La{P,QGQ) are nonzero. This remains consistent with Theorem |6l since il3\ only provides 
bounds for R{P, Gq) on either side to within log (1 + In |X|), and is not an entirely accurate measure of R{P, Gq). One can 
only conclude that 

ia(P, Qg« ) <log(l + In |X|). 

In source compression with mismatch where the "nuisance" term is not log (1 + In |X|) but the constant 1. Yet, in the examples 
in Section IVTll on guessing we see how to make good use of these bounds. See also the discussion following Theorem |8l at 
the end of the next section. 



III. Campbell's coding theorem and redundancy 

Campbell in [7] and [8] gave another operational meaning to the Renyi entropy of order a > 0. In this section we show 
that La arises as "inaccuracy" in this problem as well, when we encode according to a mismatched source. To be consistent 
with the development in the previous section, we will assume that X is coded when the source coder has side information Y. 

Let X and Y be finite alphabet sets as before. Let the true source probabilities be given by the PMF P on X x Y. We wish 
to encode each realization of X using a variable-length code, given side information Y. More precisely, let the (nonnegative) 
integer code lengths, l{x,y) satisfy the Kraft inequality, 

^2~'(^'«) < 1, e Y 

The problem is then to choose I among those that satisfy the Kraft inequality so that the following is minimized: 

-l<p<oo,p^O, (14) 

where the expectation E is with respect to the PMF P. As p ^ 0, this quantity tends to the expected length of the code, 

E[liX, Y)]. 

Observe that we may assume *at Exex2"'^'''^^ > 1/2 for each y; otherwise we can reduce all lengths uniformly by 1, 
still satisfy the Kraft inequality and get a strictly smaller value for il4\ . Henceforth, we focus only on length functions that 
satisfy 

i < ^2-'(^'J^) < 1, Vy e Y. (15) 



■log E 



2Pl{X,Y) 
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Theorem 7: (Campbell's Coding Theorem, Campbell [7]) Let — 1 < p < cx), p ^ 0. Consider a source with PMF P. Let 

= -r^. Then 

i+p 

1 



Hc{P) < - log l^minE ^2'''(^'^)J j < iJal^") + 1 

where the minimization is over all those length functions that satisfy \\5\ . 
For a PMF Q on X x Y, let Iq be defined by 



A 



log 



= r-log(Q'(x|y))l, 



where [■] refers to the ceiling function and Q'{- \ y) is a conditional PMF on X. Clearly, Iq satisfies ( I15> . 
Analogously, for any length function satisfying (I15> . we can define a PMF on X x Y as follows: 



□ 

(16) 
(17) 

(18) 



We can easily check that /g, — I. 

Let us define the redundancy for any / satisfying (I15> as 



A 1 



log E 



2p/(x,y) 



1 



■ log minE 
V 9 



2P9iX,Y) 



analogous to the definition without side information in [9]. Following the same sequence of steps as in the mismatched guessing 
problem, it is straightforward to show the following: 

Theorem 8: Let — 1 < p < oo, p ^ 0, a = 1/(1 + p). Consider a source pair {X, Y) with PMF P on X. Let / be a length 
function that denotes an encoding of X with side information Y, and Qi the associated PMF given by (I18> . Then 



\Rc{PJ)~L^{P,Qi)\<l. 



(19) 

□ 



We interpret La{P, Qi) as the penalty for mismatched coding when Qi is not matched to P. La{P, Qi) is indicative of the 
redundancy to within a constant, as the Kullback-Leibler divergence is in mismatched source compression. By comparing M9\ 
with ( I13> . we see that the nuisance term in this problem is a constant that does not depend on the size of the source alphabet; 
La{P, Qi) is therefore a more faithful representation of Rc{P, I) than La{P, Qg) is of R{P, G). 



IV. Problem statement 

Let T denote a set of PMFs on the finite alphabet X x Y. T may be infinite in size. Associated with T is a family T 
of measurable subsets of T and thus (T, T) is a measurable space. We assume that for every (x, j/) G X x Y, the mapping 
P 1-^ P{x,y) is T-measurable. 

For a fixed p > 0, we seek a good guessing strategy G that works well for all P E T. G can depend on knowledge of T, 
but not on the actual source PMF. More precisely, for PeT the redundancy denoted by R{P, G) when the true source is P 
and when the guessing list is G, is given by il2\ . The worst redundancy under this guessing strategy is given by 

sup R{P, G) 

PeT 

Our aim is to minimize this worst redundancy over all guessing strategies, i.e., find a G that attains the minimum 

R* = min sup i?(P, G) (20) 

G pgT 

In view of Theorem |6| clearly, the following quantity is relevant for < a < 1. The definition however is wider in scope. 
Definition 9: For Q < a < oo,a ^ 1, 



C = min sup La{P, Q). 

Q PGT 



(21) 



The following theorem justifies the use of "min" instead of "inf". 
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Theorem 10: There exists a unique PMF Q* such that 

C = sup La{P, Q*) = inf sup L„(P, Q). 

Pel Q pgt 

□ 

The proof is in Section IVI-CI 

Remark: 1) C < log |X| and is therefore finite. Indeed, take Q to be uniform PMF on X x Y. Then 

(F, Q) - log |X| - i/„ (P) < log |X| , VP e T. 

2) The minimizing Q* has the geometric interpretation of a center of the uncertainty set T. Accordingly, C plays the role of 
radius; all elements in the uncertainty set T are within a "squared distance" C from the center Q* . The reason for describing 
La{P,Q) as "squared distance" will become clear after Proposition 1241 

The following result shows how to find good guessing schemes under uncertainty. 

Theorem 11: {Guessing under uncertainty) Let T be a set of PMFs. There exists a guessing list G* for X with side 
information Y such that 

sup P(P, G*) < C + log(l + In |X|). 

Conversely, for any arbitrary guessing strategy G, the worst-case redundancy is at least G — log(l + In |X|), i.e., 

sup P(P, G) > G - log(l + In |X|). 

□ 

Proof: Let Q* be the PMF on X x Y that attains the minimum in (l2ll . i.e., 

G= sup L„(P,Q*). (22) 

PeT 

Let G* = Gq* . Then 

i?(P, G*) < L„(P, Q*) + log(l + In |X|) (23) 

follows from Proposition [2 applied with Q — Q*, the first inequality of Theorem |5] and ( II 1> . as in the proof of Theorem |6l 
After taking supremum over all P G T, and after substitution of M2\ . we get 

sup P(P,G*) < supL„(P,Q*) +log(l +ln|X|) 
PeT PeT 

= G + log(l +ln|X|), 

which proves the first statement. 

For any guessing strategy G, observe that Theorem |6l implies that 

P(P, G) > L„(P, Qg) - log(l + In |X|), 

and therefore 

sup P(P,G) > sup L„(P, Qg) -log(l + In |X|) 
PeT PeT 

> G-log(l + ln|X|), 

which proves the second statement. ■ 

Remarks: 1) Thus one approach to obtain the minimum in \2Q\ is to identify minimum value in (12 It . This minimum value 
will be within log(l + In |X|) of R* in ( I20> . Moreover, the corresponding minimizer Q* can be used to generate a guessing 
strategy. 

2) Theorem II II can be easily restated for Campbell's coding problem. The nuisance term log(l + In |X|) is now replaced by 
the constant 1. 

3) The converse part of Theorem 111 I is meaningful only when G > log(l + In |X|). This will hold, for example, when the 
uncertainty set is sufficiently rich. The finite state, arbitrarily varying source is one such example. Observe that if we have 
X X Y = A" X B", then log(l + In |X|) grows logarithmically with n if |X| > 2. The uncertainty set will be rich enough for 
the converse to be meaningful if G grows with n at a faster rate. 
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V. Relations between La and other divergence quantities 

Having shown how La{P, Q) arises as a penalty function for mismatched guessing and coding, we now study it in greater 
detail and relate it to other divergence quantities. The relationships we discover here will be useful in the sequel. Throughout 
this section, 0<a<oo,Q!7^1. Accordingly, — l<p<oo,p7^0. Let P and Q be PMFs on X x Y. 

1) As we saw before, La{P, Q) > 0, with equality if and only if P ~ Q. 

2) LaiP, Q) = oo if and only if Supp(P) n Supp(Q) = 0, or a < 1 and Supp(P) ^ Supp(Q). 

3) Given the joint PMF P, let us define the "tilted" conditional PMF on X as follows: 

if Eaex^(«'y)" >0> (24) 
1/|X|, otherwise. 



P\x \y)^{ 



The above definition simplifies many expressions in the sequel. The dependence on a in the mapping P ^ P' v& 
suppressed. 

4) When |Y| — 1, we interpret that no side information is available. Then P and Q may be thought of PMFs on X with 
no reference to Y. P' and Q' given by J24l i are PMFs in one-to-one correspondence with P and Q respectively. 
Using the expression for Renyi entropy and ( II \\ . we have that 



= A/a(^'IIQ'), (25) 
where Dp{R \\ S) is the Renyi's information divergence of order /?, 



DpiR II S) = ^log (E^(^)^^(^)'"'') 



which is > and equals if and only if P = S*. For the case when |Y| = 1 we therefore have another proof of 
Proposition |3 

5) The conditional Kullback-Leibler divergence is recovered as follows: 



lim L„(P,Q) = EE^(--^)log(^) 



y X 

where Q{- \ y) and P(- | y) are the respective conditional PMFs of X given Y — y. 

6) In general, La{P, Q) is not a convex function of P. Moreover, it is not, in general, a convex function of Q. 

7) In general, La{P,Q) does not satisfy the so-called data-processing inequality. More precisely, if X' and Y' are finite 
sets, and if / : X x Y ^ X' x Y' is a function, it is not necessarily true that La{P, Q) > La{Pf~^,Qf^^)- 

8) When |Y| — 1, i.e., in the no side information case, using ( I24t we can write La{P, Q) as follows: 

i„(P, Q) = i log [sign(p) • If{P' II Q')] , (26) 
where If{R \\ S) is the /-divergence [17] given by 

'R{x) 

KG 

with 



IfiR\\S) = J2Six)f^^), (27) 



f{x)=sign{p)-x'+P, x>0. (28) 
Since / is a strictly convex function for p ^ 0, an application of Jensen's inequality in i27\ indicates that 

ifiR II s) > /(I) ^ { -\<;^<^; (29) 

Moreover, when — 1 < p < 0, we have the following bounds: 

- 1 < IfiR II S) < 0. (30) 
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9) Let us define 

A 

The dependence of h on a is understood, and suppressed for convenience. Clearly, 



Ha{P)^- logM^)- (31) 

1 — a 



Motivated by the relationship in (I26> . let us write La in the general case as follows: 



where I{P, Q) is given by 



(P, Q) = - log [sign(p) • /(P, Q)] , (32) 
P 



np,Q) 



These expressions turn out to be useful in the sequel. 
It is not difficult to show that 

I{P,Q)^Y.^{y)-If{P'{-\y) II Q'{-\y)), 



where w is the PMF on Y given by 



(34) 



Consequently, the bounds given in i29\ and i30\ are valid for I{P, Q), under corresponding conditions on a. 

10) Inequalities involving L^, result in inequalities involving / with ordering preserved. More precisely, for r > 0, if 
L„(P, Q) < r, then /(P, Q) < t, for t = sign(p) • 2'"^. 

11) From the known bounds < Ha{P) < log |X|, it is easy to see the following bounds: 

1 < h{P) < |X|^, for < a < 1, (35) 

and 

|X|^ < h{P) < 1, for 1 < a < cx). (36) 

In both cases, we see that h{P) is bounded away from and therefore ( I33l l and i34l are well-defined. 

The quantity La{P, Q) does not have many of the useful properties enjoyed by the Kullback-Leibler divergence, or other 
/-divergences, even in the case when |Y| = 1. See for example, comments |6l and made earlier in this section. However, it 
behaves like squared distance and shares a "Pythagorean" property with the Kullback-Leibler divergence. This is explored in 
Section IVlIll 



VI. Lq-center and radius of a family 

In this section we identify the -center and radius of a family. We first begin with a finite family and subsequently study 
an arbitrary family (that satisfies some measurability conditions). We finally conclude the section with a proof of Theorem II 01 



A. La-center and radius of a finite family 

Let |T| be finite. For simplicity, assume that no side information is available. We will therefore use X instead of the 
cumbersome X x Y. Our main goals here are to verify using known results that the L^-center exists, is unique, and lies is in 
the closure of the convex hull of T. We then briefly touch upon connections with Gallager exponents, capacity of order 1 /a, 
and information radius of order 1/a. The development in this section will suggest an approach to prove Theorem^] for the 
case when |T| is infinite. 
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1} Proof of Theorem \1 6>l f or a finite family of PMFs: Let T = {Pi, • • ■ , P„i] be PMFs on X. The problem of identifying the 
L^-center and radius can be solved by identifying the Di/^-center and radius of the tilted family of PMFs {P/ | 1 < i < ra], 
where the invertible transformation from Q i-^ Q' is given by ( I24> . Moreover, from M5\ and M6\ , we have 

inf max La{Pi,Q) (37) 

Q l<i<m 

= inf max I?i+p(P/ || Q') (38) 

Q l<i<m 

= - log f sign(p) inf max //(P/ || Q')) , (39) 

P \ Q l<i<m / 

Csiszar considered the evaluation of ( I38l l in [15, Proposition 1], and the evaluation of the inf-max within parenthesis in ( I39l l 
in [17]. 

From [17, Theorem 3.2] and its Corollary (the required conditions for their application are / is strictly convex and /(O) < oo; 
these clearly hold since p ^ and /(O) — 0) there exists a unique PMF {Q')* on X, which minimizes maxi<i<,„ If {Pi \\ Q')- 
From the bijectivity of the Q Q' mapping, the infima in ( I37K ( I38> . and ( I39l l can all be replaced by minima. From the 
inverse of the map Q Q', we obtain the unique minimizer Q* for i37l . This proves the existence and uniqueness result of 
Theorem II 01 when |T| is finite. 

2) Minimizer is in the convex hull: Let £ be the convex hull of T. That the minimizer Q* is in the convex hull of the 
family, i.e., Q* G 8, can be gleaned from the results of [17, Equation 2.25], [17, Theorem 3.2], and its Corollary. Indeed, [17, 
Theorem 3.2] assures that 

min max //(i^' || Q') (40) 

Q' l^i^m 

ni 

= maxminV II Q'), (41) 

M Q' ^ 

where the max-min in J41> is achieved at (/i*, Q'*), and Q'* is the PMF which attains the min-max in (I40> . We now seek to 
find out the nature of Q'* and thence Q*. 

For any arbitrary weight function p, we have from [17, Equation 2.25] that the Q' which minimizes 

m 

j2pm{pi\\Q') (42) 

i=l 

/ m \ " 

Q'ix) = c-i. (43) 

for every a: G X, where c is the normalizing constant. From the correspondence between the primed and the unprimed PMFs, 
and ( l44l . we obtain 



IS 



where d is the normalizing constant 



g(x)=d-i^^P,(x), Va;GX (45) 



Thus, for an arbitrary p, the Q (obtained from Q') that minimizes (I42t is in the convex hull £. In particular, the minimizing 
Q* corresponding to the p* that attains the max-min objective in J4U . and therefore the min-max objective in ( I40t . is also in 
£. This result will be proved in wider generality in Section fVIIII 
With some algebra, we can further show that 

C = min max i„(P„ Q) = log(d • h{Q*)), (47) 

Q l<i<m L — a 

where Q* is given by ( l45i and d by (l46i with p — p*. 
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3) Necessary and sufficient conditions for finding the La-center and radius: From [17, Theorem 3.2], a weight vector fi 
maximizes i4l\ if and only if 

IfiPl\\Q')<K, t^l,2,---,m, (48) 

where equality holds whenever fj,{i) > 0, and Q' is given by i43i . Under this condition, clearly, the corresponding Q given 
by ( I45> is the L^-center and C — (1/p) log(sign(p) • K) is the L^-radius. 

An interesting special case occurs when h{Pi) is independent of i. Then we may simplify ( 145 1 to 

m 

g = ^/i(i)p„ (49) 

i=l 

i.e., the weights that make the optimum mixture (of PMFs) are the same as the given weights that form the objective function 
in (Ell. 

4) Relationship with Gallager exponent: For the set of PMFs {Pi | 1 < i < m} the tilted set {P/ | 1 < « < m} can be 
considered as a channel with input alphabet {1, 2, • • ■ , m} and output alphabet X. This channel will be represented as P' . 

From the remarks in [15] on the connection between information radius of order 1/a and the Gallager exponent of the 
channel P', and from [15, Proposition 1], we have 

min max La{Pi, Q) — max £'o(a — 1, /i, P'), 

Q l<i<m fJ. a — 1 

where the right-hand side is the maximized Gallager exponent of the channel P'. (1 < a < 2 is relevant in [18, p. 138], 
1 < a < oo in [18, p. 157], and < a < 1 in [19]). 

5) The max-min problem for La: Thus far our focus has been on the min-max problem of finding the ia-center. We briefly 
looked at identifying the max-min value of // in ( 14 1> . but only as a means to study the min-max problem. We now make some 
remarks about the max-min problem for the finite family case. Its extension to arbitrary uncertainty sets is not considered in 
this paper 

Suppose that our new objective is to find 

m 

maxmii[iy^fj,{i)La{Pi,Q). (50) 
This problem is the same as identifying the "capacity of order 1/a" of the channel P' [15], i.e., 

m 

maxmin Vp(i)i:>i/„(P/ || Q'). 

[15, Proposition 1] solves this problem; the value is the same as the min-max value 

imn max D^/a{Pl II Q')- 

Q l<i<m 

Consequently, the max-min value of i5Q\ is the same as the Xq, -radius of the family. 



B. La-center and radius for an arbitrary family 

We are now back to the case with side information and an infinite family T. The development in this subsection will be 
analogous to Gallager's approach [12] for source compression. We first recall the technical condition indicated in Section fTv] 
T is a family of PMFs on X x Y, (T, T) a measurable space, and for every (a;,y) e X x Y, the mapping P P{x,y) is 
T-measurable. 

Our focus will be on the following: 

Definition 12: For < a < oo, a ^ 1, 

K+ = mm sup I {P,Q). (51) 

Q PeT 

Taking Q to be the uniform PMF on X x Y it is easy to check that /v+ is finite; indeed 1 < K-^- < \X\p when p > and 
-l< K+ <0 when -1 < p < 0. 

XI lY 

Let us define some other auxiliary quantities. Define the mapping / ; T ^ as follows: 



For a probabiUty measure fi on (T, T), let 



f{P)^P/h{P). 



F ^ I d^i{P)fiP). (52) 
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We define the PMF /i/ G 7'(X x Y) as the scaled version of F, 
where d as in the finite case is the normahzing constant 



= d-^F (53) 



"-/.^-En.)^ (54) 



T 



These definitions are extensions of (145 > and (146 > to arbitrary T. Moreover, let 

J(/x,T) = / (55) 



Simple algebraic manipulations result in 

J{p,T) = sign(p) • (56) 
sign{p) ■ d- h{^if), (57) 

an extension of [17, Equation (2.24)] for arbitrary T. 
The following auxiliary quantity will be useful. 

Definition 13: For < a < oo, a ^ 1, 

=snpJ{ji,T). (58) 

The quantity fif in (I53> is analogous to the PMF at the output of a channel represented by T when the input measure is ^. 
J(/j,, T) in ( I55> is the analogue of mutual information; Csiszar calls it informativity in his work on finite-sized families [17]. 

Proposition 14: < K-^.. 

Proof: Fix an arbitrary PMF Q on X x Y. It is straightforward to show that [17, Equation 2.26] holds even when |T| is 
not finite, and is given by 

/ dii{P) I{P, Q) = sign(p) • J(pL, T) • J(Ai/, Q). 
Jt 



Since I{iJ.f,Q) > sign(/ci), it follows that 
Consequently 



/ d^iiP)■IiF,Q)>J{fl,T). 
Jt 

J(/i,T) =min / d^i{F) I{F,Q), 



T 



which leads to 

= sup J(/i, T) 



= supmin / dn{F) I{P,Q) 
M Q Jt 

< minsup / d^i{F) I{P,Q) 
Q fj. Jt 

= min sup /(P, Q) 
Q PeT 

= K+. 

■ 

The following Proposition is similar to [12, Theorem A]. The proof largely runs along similar lines. 

Proposition 15: A real number R equals K- if and only if there exist a sequence of probabihty measures (/z„ : n G N) on 
(T, T) and a PMF Q* on X x Y with the following properties: 

1) lim„ J(/^„,T) = R; 

2) lim„/i„/ = Q*; 

3) I{P, Q*) <R, for every PeT. 

Furthermore Q* is unique, attains the minimum in ( I5H . and = K+. □ 
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Proof: Observe that on account of 1), 3), and Proposition 1 1 41 we have 

> R 

> sup /(P,Q*) 

> min sup I{P, Q) 

Q PeT 

= K+ 

> K^, 

where the first inequality follows from 1), the second from 3), and the last from Propositionll4l Consequently, all the inequalities 
are equalities, R = A'_ = 7^+, and the use of "min" in the definition of A% is justified. 

Since R = < < oo, it follows from the definition of that there exists a sequence (/x„ : n G N) such that 
lim„ J {fin, TT) = R. 

Now consider the sequence of vectors in rI^H"*^! given by Fn = Jj.dfin{P)f{P)- This is a sequence of scaled PMFs given 
by Fn — dn ■ fJ-nf, where dn is given by (I54> . The sequence resides in a compact space of scaled PMFs and therefore has a 
cluster point F* which can be normalized to get the PMF Q* . Moreover we can find a subsequence of [Fn : n G N) such 
that limfc F„j. = F* . We redefine the sequence /i„ as given by this subsequence, and properties 1) and 2) hold. 

Suppose now that there is a Pq G T such that 3) is violated, i.e., 

I{Po,Q*)>K^. 

Consider the convex combinations of measures 

i^„,A - (l-AK + (A)(5p„, (59) 

where 5p„ is the atomic distribution on Pq- 
From ( l59t . (l52t . and ( l56l . we have 

s„(A) = J(i^„,A,T) 

= sign(p)./i((l-A)P„ + A/(Po)). 

Since sign(p)/i(-) is a concave and therefore continuous function of its vector-valued argument, s„(A) converges point-wise to 

s{\) = sign(p) • /i ((1 - \)F* + A/(Po)) , 

for A e [0,1]. In particular, s(0) = lim„s„(0) — K^. s(A) is a concave function of A since sign(p)/i(-) is concave and 
the argument is linear in A. Let s(0) be the one-sided derivative of s(A) evaluated at A = {i.e., limit as A J, 0). We can 
straightforwardly check that 

m = I{Pq,Q*) - K- >0, 

with the possibility that the value (slope at A = 0) may be +cxd. 

We have therefore established that s(A) has s(0) ~ K^, is concave and therefore continuous in [0, 1], and has strictly 
positive slope at A = 0. Consequently, s(A) > for some < A < 1. Since 

JK,A,T) = s„(A) ^s(A) >K. 

contradicts the definition of K^, 3) must hold. 

To show uniqueness of Q*, suppose there were another R* and another sequence of measures (7r„ : n G N) satisfying 
1), 2) and 3). We can get two cluster points F* and G* that when normalized lead to Q* and R*, respectively. Then with 



2 ^~ 2 



JK,T) ^ sign(p) • Qp* + 

> i.sign(p)./j(P*) + i.sign(p)./i(G*) 



a contradiction. The strict inequality above is due to strict concavity of sign(/3)/i(-) when p > — 1 and p ^ 0. 
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C. Proof of Theorem \10\ 

Proof: From ( I32L it is clear that 



C^-\og{Hign{p)-K+). 
P 



Q attains the min-sup value in Definition if and only if Q attains the min-sup value C in Definition |5] Proposition 1 151 
guarantees the existence and uniqueness of such a Q. ■ 



VII. Examples 

In this section we look at two example families of PMFs, and identify their La-centers and radii. We focus on guessing 
without side information. We also take a closer look at the binary memoryless channel and obtain tighter upper bounds on 
redundancy than those obtained via Theorem ^2 Throughout this section, therefore, < a < 1 and |Y| = 1. The uncertainty 
set will thus be PMFs in X (with no reference to |Y|). 



A. The family of discrete memoryless sources 

Let A be a finite alphabet set, n a positive integer, and X = A". We wish to guess n-strings with letters drawn from A. Let 
a" = (ai, • • • , an) £ A". Let P(X) denote the set of all PMFs on X. 
Let T be the set of all discrete memoryless sources (DMS) on A, i.e., 

T=lp„er (A") I F„(a") = f[P{a^), Va" G A", 

I i=l 

and P e V{A)} , 

The parameters of the source P are unknown to the guesser Arikan and Merhav [6] provide a guessing scheme for this 
uncertainty set. The scheme happens to be independent of p. Moreover, their guessing scheme has the same asymptotic 
performance as the optimal guessing scheme. Their guessing order proceeds in the increasing order of empirical entropies; 
strings with identical letters are guessed first, then strings with exactly one different letter, and so on. Within each type of 
sequence, the order of guessing is irrelevant. Denote this guessing list by Gn- Arikan and Merhav [6, Theorem 1] showed that 
for any P„ e T, 

lim -i?(P„,G„) = 0. 

n — >oo Ji 

The above result is couched in our notation. This indicates that T, the family of all DMSs on A, is not rich enough in the 
sense that there exists a "universal" guessing scheme. The following result makes this notion more precise. 

Theorem 16: {Family of DMSs on A) Let m = |A|. The L^-radius C„ of the family of discrete memoryless sources on A 
satisfies 

TO - 1 n 

Cn < 7^ log — + U,n + £n, 

2 2,71 

where ~ log (r(l/2)™/r(m/2)), a constant that depends on the alphabet size, and £„ is a sequence in n that vanishes as 

n — > oo. □ 

Proof: Recall that p > 0. P„ is the joint PMF of the n-string with individual letter probabilities P. Let P„ ^ P/^ 
according to the mapping given in (I24> . It is easy to verify that P^ is the joint PMF of the n-string with individual letter 
probabilities P', where P ^ P' according to the mapping ( I24> . and therefore P^ also belongs to T. Furthermore, for a fixed 
a" g A", let Sa^ be the PMF of letter frequencies in a", and define 

n 
i=l 

for every a;" G A". Note that Sa^,n is not necessarily a PMF. Xie and Barron [20, Theorem 2] show that there is a PMF on 
A", say (5'„, and a vanishing sequence e„, such that for every discrete memoryless source P/^, the following holds: 

, „s < max log ' „ (60) 

< rn 

A TO — 1 , n ^^^^ 
= — 7, — log— + (61) 

Define the PMF Q„ as follows: 

Qn{-)'^{Q'rX-)f\ 
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the inverse of the mapping in J24> . We then have the following series of inequalities: 




^n(«")(^SVl (62) 



'n(a"). 

_ F;(a")-exp{pr„}J (63) 

\ a"eA" / 

= ilog(exp{pr„}) 
P 

where (I62> follows from (I25> and (I63> from (16 1> . Taking the supremum over all P„ yields the theorem. ■ 

Remark : Redundancy in guessing is thus upper bounded by r„ + log(l + nln |A|). Since the L^-radius grows with n as 
O(logn), the normalized redundancy C„/n vanishes. This implies that we can get a "universal" guessing strategy. Theorem 
[^suggests the use of Q„, which in general may depend on p. Arikan and Merhav's technique of guessing in the order of 
increasing empirical entropy is another universal guessing technique. 

Given any guessing scheme, how do we "measure" the set of DMSs which result in relatively large redundancy? The 
following theorem answers this question, and uses a strong version of the redundancy capacity theorem of universal coding in 
[21] and [22]. 

Theorem 17: Let Q„ be any PMF on A". Let /i be a probability measure on (T,T) and let ^ — Jj.dp,{PU P!^. Then 
for any DMS P„, we have 

L^{Pn,Qn)>D{P:^ \\P.i^)-Xn 

except on a set B of /i-probabiUty < 2^"'^". 

Proof: Observe that p > 0. An application of Jensen's inequality to the concave function log(-) yields 



{Pn 1 Qn ) 



> - E ^n(«")l0g 



P 



= DiPnWQ'n)- 

The theorem then follows from [22, Theorem 2] which states that the redundancy in source compression D{P'^ \\ Q'J is at 
least as large as D{P'^ \\ P'^ ^) — A„ except on a set B of /i-probability upper bounded by 2~"^". ■ 

Remark : In particular, we may do the following. We choose p such that D{P^ \\ P,' ^ J — Vn. (This can be done since 
the inf-sup value of infg;^ supp, D{Pl^ \\ Q[J is r„, as remarked in [20, Remark 5 after Theorem 2]. We may then choose 
A„ such that ?iA„ — > oo so that 2^"-^" vanishes with n, but A„ is negligibly small compared to r„. (For example, for the 
family of DMSs, r„ = O(logn) and therefore we may set A„ = (loglogn)/(log?i)). Then, the set of sources P for which 
La{Pn,Qn) < »'n " A„ has negligible /i-probability for all sufficiently large n. Equivalently, with high /i-probability (at least 

1 — 2 "•^"), La{Pn, Qn) > Tn ~ A„. 

Since La quantifies the redundancy in Campbell's coding problem to within unity, the above remark leads us to conclude 
that the redundancy in that problem is tightly bounded as 2i^(logn) (up to a constant). 

In the guessing context, since the nuisance term log(l + nlnm) grows as logn + log In m for large n, we deduce that with 
high /i-probability (at least 1 — 2^"'^"), the guessing redundancy of any strategy is at least r„ — A„ — log(l + nlnm), which 
for large n is 

m, 3 TTi ~~ 1 

— - — log n + u„ H — log(27r) - log In m + e„ - A„. (64) 

This fact and Theorem[^immediately lead us to conclude that for m > 4, the redundancy is between ^^^ ^ l og n and ^22±i log n 
for large n (ignoring constants and smaller order terms). For m ~ 2 and m — 3, the lower bound in ^6^ is useless, and the 
upper bound ^^±1 Jog n may not be tight. The case of m = 2 is addressed in the next subsection. Tighter upper bounds for 
m — 3 remain to be found. 
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B. Guessing an unknown binary memoryless source 

The -based bounding technique suggested by Theorem provides good bounds on guessing redundancy for large n 
when the DMS's alphabet size m > 4. In this subsection, we identify tighter upper bounds on the guessing redundancy of a 
binary memoryless source using a more direct approach. 

Let A = {0, 1}. There is only one unknown parameter, i.e., p — P{1). The probability of any n-string is given by 

= p^(-")(l = (1 - p)" f j 

where N{x") is the number of Is in the string x". Since P„(a;") is monotonic in N{x"), it immediately follows that when 
p > 1/2, the optimal guessing order is to guess the string of all Is, followed by all strings with exactly one 0, followed by 
all strings with exactly two Os, and so on, viz., in the decreasing order of number of Is in the string. Note that the optimal 
guessing sequence is the same for all sources whose p > 1/2. Exactly the opposite is true when p < 1/2 - the guessing 
proceeds in the increasing order of number of Is, the first guess being the all-0 sequence. 

Thus there are only two optimal guessing lists for the binary memoryless source. By guessing one element from each list, 
skipping those already guessed, we obtain a guessing list that requires at most twice the optimal number of guesses, i.e., 
G(a;") < 2Gp^{x"') for every a;" S A". This guessing list is one of those that proceed in the increasing order of empirical 
entropy. Clearly then, the redundancy is upper bounded by the constant log 2, a bound tighter than Theorem[^ C„/n therefore 
vanishes as (log2)/7i. It is not known if this is the tightest upper bound. 



C. Arbitrarily varying sources 

For the family of DMSs, we saw in Section fVII-AI that the redundancy is upper bounded by O(logn). In this section we 
look at the example of finite-state arbitrarily varying sources (FS-AVS) for which the redundancy grows linearly with n. Yet 
again, for exposition purposes, we assume |Y| = 1. 

As before, let X = A". Let § be a finite set of states, and for each s e §, let P(- | s) be a PMF on the finite set A. An 
arbitrarily varying source (AVS) is a sequence of A-valued random variables Xi,X2; ■ ■ ■, such that Xi's are independent and 
the probability of an n-string is governed by an arbitrary state sequence s" G S" as follows: 

n 
i=l 

Observe that for a fixed n, there are only |S|" sources in the uncertainty set. Let Ts>i be the subset of all sequences in §" 
with the same letter-frequencies as s". Ts>i is also referred to as the type of the sequence s" [23]. If the letter frequencies are 
given by a PMF U on §, we refer to Tu as the type of sequences. Let y be a stochastic matrix given by V{x \ s) for x G A 
and s G S. Then for a particular sequence s", we refer to Ty(s"), the set of sequences that are of conditional type V given 
s", as the V-shell of s". 

Proposition 18: Let < a < 1. Let Ty be a type of sequences on Let the uncertainty set T be given by T = {P„(- | 
s") I s" G Tu}. The L^-radius of this family is given by 

Rn{Tu) = H^{Q:,)-^ H^{Pn{-\s")), (65) 

where the Lq, -center Q* is given by 

□ 

Remarks : 1) It will be apparent from the proof that the quantity Ha{Pn{- \ s")) in i65i depends on s" only through its 
type, and hence the average over all sequences in the type may be replaced by the value for any specific s" & Tjj. 

2) All PMFs in the uncertainty set are spaced equally apart (in the sense of L^-divergence) from the L^-center Q*. 

3) Guessing in the decreasing order of Q*-probabilities results in a redundancy in guessing that is upper bounded by 

i?„(Tc/) + log(l + 7iln|A|). 

4) sign(p) • h{P) is a concave function of P. It follows from ( 13 U that Ha{P) is also a concave function of P for < a < 1. 
By Jensen's inequality, Rn{Tu) > 0. (For a > 1, Ha{P) is neither concave nor convex in P). 

5) For any guessing strategy, there exists at least one sequence s" G Tjj for which the redundancy is lower bounded by 
Rn{Tu) — log(l + nln|A|). We will see later in Proposition 1201 that if the U sequence (parameterized by n) converges as 
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n ^ oo to a PMF U* G V{^), then i-i?„(T(7) converges to a strictly positive constant. Thus Rn{Tij) grows linearly with n, 
thereby making the converse meaningful; the nuisance term log(l + nln |A|) grows only logarithmically in n. 

Proof: Note that given an n, the uncertainty set is finite. We will simply show that the candidate Let-center satisfies the 
necessary and sufficient condition J48> given in Section IVI-A.3I From J33t . it is sufficient to show that 



If{PL{- I II Q*n) 



KPn{- I S")) 



(67) 



= K, 



where K is some constant that depends only on n and Tjj. We will show that the numerator and denominator in do not 
depend on the actual s", so long as s" e Tu. 

Observe that the stochastic matrix that defines the conditional PMF is given by P{x \ s) for a; G A and s e §. Consider 

h{Pn{- I s")). First 

^ (p„(x"i.")r 

= ^\Tv{s")\eyiY>{-na[D{V \\ P \ U) + H{V \ U)]} 

V 

where the sum is over all conditional types V. All the quantities inside the summation, including |Ty(s")|, depend on s" only 
through Tjj, and therefore h{Pn{- \ s")) depends on s" only through Tij. 

Next, Q* (x") depends on only through T^"-. This is easily seen via a permutation argument. Given two A-sequences of the 
same type, let tt be a permutation that takes (x", s") to ((a;,r(i) , ^^^{n) ), (•Sir(i) j ■ ■ ■ , ^^(ra))), where s" and (s,r(i) i ■ • ■ : Sir(n)) 
are the two given A-sequences. This permutation tt leaves | s") unchanged. Moreover, the sum continues to be over 

Tu = { (S7r(l) , S;t(2) , • • ■ , S7r(n) ) £ I 

Thus and therefore Q* (x") depend on a;" only through Tj-n. 

Finally, given two A-sequences of the same type Tjj, the above permutation argument indicates that 

the numerator of ( I67t . depends on s" only through Tjj- 

That i?„(Tc/) is given by ^ follows from gSj, §6), gTj, the fact that /i(P„(- | s")) is a constant over all s" e Tu, and 
J31> . This concludes the proof. ■ 

The number of different types of sequences grows polynomially in n, in particular, this number is upper bounded by (n+ 1) 1^' . 
We can use this fact to stitch together the guessing lists for the different types of sequences on §" and get one list that does 
only marginally worse than the list obtained by knowing the type of the state sequence. 

Proposition 19: Let < a < 1. Let the uncertainty set T be given by T = {P„(- | s") | s" € S"}. There is a guessing 
strategy such that for every Tjj, the redundancy is upper bounded by 

Rn{Tu) + log(l + nln |A|) + |§| log(n + 1). 

whenever s" G Ty. □ 

Proof: Let N be the number of types. N is upper bounded by {n + l)'^'. Fix an arbitrary order on these types. Let the 
kth type be Tjj- Set Gk = Gtu^ where Gtu is the guessing strategy that is obtained knowing that s" G Tu, via Proposition 
[Tsl It proceeds in the decreasing order of probabilities of the -center of the uncertainty set indexed by Tu- 

We now stitch together the guessing lists Gi, G2, • ■ • , Gat to get a new guessing list G, as follows. Think of Gk as a column 
vector of size |A"| x 1 and let H be the column vector of size N ■ |A"| x 1 obtained by reading the entries of the matrix 
[Gi G2 • ■ • Gn] in raster order (one row after another). Every A would have figured exactly once in the Gk list, and therefore 
occurs exactly TV times in the H list. Next, prune the H list. For each i, if there exists an index j with j < i and Hi — Hj, set 
Hi = S. This indicates that the ith string akeady figures in the final guessing list. Finally remove all (5's to obtain the desired 
guessing list G : A" — ^ {1, 2, • • • , |A|"}, where G{x") is the unique position at which x'^ occurs in the pruned H list. 

Clearly, for every x" and for every k such that 1 < fc < A^, we have G(a;") < NGkix"^). Indeed, occurs in the position 
{Gkix""), k) in the matrix constructed above. It therefore occurs in position {Gk{x") — 1)N + k and therefore before the 
position NGkix"^) in the unpruned H list. It cannot be placed any later in the pruned H list, and thus G(a;") < NGkix"^). 
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The above observation leads to 

- logE [G{X'')P] < - logE [G{X''Y] + logiV. 
P P 

The proposition follows from Theorem|6| Proposition II 81 and the bounding N < {n + l)'^'. ■ 

We finally remark that the min-sup redundancy for the finite-state arbitrarily varying source grows linearly with n under 
some circumstances. 

Proposition 20: For a fixed n, let [/ be a PMF on S and Tu the corresponding type. Let the sequence U (as a function of 
n) converge to a PMF U* G as rt ^ oo. Then 

lini-i?„(T;7) = R, 
n n 

where R>0. □ 



Proof: The second term in the right-hand side of i65\ . after normalization by n, converges to a nonnegative real number 
as seen below: 

n 



n 

^^log E \[Pi..\s.r 



ses 

^ Y,U*is)H^iP{-\s)). (68) 



We next consider the first term on the right-hand side of i65\ after normalization, i.e., Ha{Qn)/n, where Q* is given by 

Lemma 21: For a fixed n, let [/ be a PMF on S and Tjj the corresponding type. Let the sequence U (as a function of ri) 
converge to a PMF U* S 7'(S) as n — *■ oo. Let V be the output PMF when the input PMF on S is [/ and the channel is P. 
Furthermore, let V* be the hmiting output PMF as n ^ oo. Then lim„ -Ha{Q*n) = Ha{V*). □ 



As a consequence of this lemma and ( I68> . we have 

-Rn{Tu) ^ H^{V*) - V U*{s)H^{P{- I s)) ^ R. 
n ^ — ' 

By the strict concavity of Ha{-) for < a < 1, and Jensen's inequality, we have R > 0. This concludes the proof of the 
theorem. ■ 

Remarks : i? = if and only if either (i) U{s) = 1 for some s G S, or (m) P{- \ s) does not depend on s, i.e., the 
state does not affect the source. Thus, for all but the trivial finite-state arbitrarily varying sources, the min-sup redundancy 
grows exponentially with n at a rate R. This means that the guessing strategy that achieves the min-sup redundancy has an 
exponential growth rate strictly bigger than that of the best strategy obtained with knowledge of the state sequence. 

We now prove the rather technical Lemma |2T1 

Proof: 

(a) We first show that lim„ ^H^iQ^) < Ha,{V*). Let t/„ be the PMF on given by [/„(s") = nr=i ^(si). Let /7„{T} 



19 



denote the C/„ -probability of the set T. From j66> , we may write 



^ \Un{Tu} \Tu\ ^ " 



E f E I ^")) (69) 

< (n + 1)1^1" E ( E I ) (70) 

= (n+ 1)1^1" E ^»(^")" 

= (n + 1)1^1" j^E^l^)") : (71) 

\a;eA / 

where (|69} follows from the observation that ;7„(s") = C/„{T,7}/|T,7| for all s" e T^/, (|70|l from f/nlTt/} > (n+ 1)^1^1 (see 
proof of [23, Lemma 2.3]) and by enlarging the sum over Tjj to all of S". 
From ( 17 1> and ( 13 1> . we have 

n 1 — a n 

^ i?a(T^*). 

(b) We now show that lini„ > HaiV*). For a given PMF [/ on S and conditional PMF P, let be the induced 

PMF on X and W the reverse conditional PMF, i.e., W{s \ x) is the probability of a state s given x. 
Continuing from (I69> . we may write 

E 

- Jy^E (|:^..(OP,.(.-i,»))° 

> E f E I s")) (72) 

> E f E K(x")M^„(s" la:") 1 (73) 
= E {Vn{xnWn{T^ixn\x''})\ (74) 

where i72i follows because Un{Tij}°' < 1 and the sum over A" is restricted to a sum over a type Tq to be chosen later; il3l 
follows because J7„(s")P„(a;" | s") — Vn{x")Wn{s" \ x") and the sum over s" is now restricted over a non-void M^-shell of 
x", where W will be appropriately chosen later. 

We next observe that for x" G Tq, the following hold: 

{Tp^(x") I x"} > (^ + i)-|siW.2-"^(^II^IQ), 
|%| > + •2"^(^). 
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Substitution of these inequalities into yields 



, 2n[{l-a)H(Q)-a{D{Q\\V)+D{W\\W\Q))] 



and therefore 

1 



71 

> H(Q) -^[D{Q\\V)+D{W\\W\ Q)] 

|X|(l + a|§|)l og(n+l) ^^^^ 



1 — a n 

for any type Q of sequences and for any W such that T-^{x"') C T^/ is a non-void shell for an G Tg. 
Clearly, the last term in ( I75> vanishes as n ^ cx). 

If we can choose Q = V and Tl^ = W, we will be done since Ha{V) = H{V') - jD{V' \\ V). We cannot do this if V 
is not a type of sequences, or if W is not a conditional type given an x". But we will show that as n — ^ oo, we can get close 
enough. The following arguments make this idea precise. 

Define 

5 = min{W^(s | a;) | W{s 1 x) > 0, s G §, a; G X} 

and consider D{W{- \ x) \\ W{- \ x)). We may restrict our choice of W to those that are absolutely continuous with respect 
to W , i.e., W{- I x) ^ W{- I x) for every x G X. For sufficiently large n, we can choose such a W that in addition satisfies 



E \W{s I x) - W{s I x)\ < £„ < i Vx G X, 



and e„ 0. 
We then have 



D{W{-\x)\\W{-\x)) _ 

= H{W{- \x)) ^ H{W{- \x)) 

+ E (^(« I ^) ~ "^^(« I ^)) log I 

< \H{W{-\x))- H(W{-\x))\ 

- {\og5)^\W{s I x)-W{s I x)| 

< -e„ log ^ - En log (S, (76) 
where M6\ follows from [23, Lemma 2.7]. After averaging, we get 

D(W \\W\Q)< -£„ log ^ - £„ log,5 ^ 0. 

A similar argument shows that 

H{Q) - -D (0 II V) 
p ' ^ 

= H^{V)+[H{Q)-H{V')] 

-l[D{Q\\V)-DiV' \\V)] 

where we have made use of the fact that Ha{V) = H{V') — {1/ p)D{V' \\ V). This concludes the proof of Lemma 1211 ■ 
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VIII. Lq-PROJECTION 

In this section we look at an interesting geometric property of divergence that makes it behave like squared Euclidean 
distance, analogous to the Kullback-Leibler divergence. Throughout this section, we assume a > — 1 and a 7^ 0. 

We proceed along the lines of [5]. Let X and Y be finite alphabet sets. Let ■p(X x Y) denote the set of PMFs on X x Y. 
Given a PMF i? on X x Y, the set 

B{R, r) = {Pe V{X X Y) | L^P, R) < r} , 0<r<oo, 

is called an L^-sphere (or ball) with center R and radius r. The term "sphere" conjures the image of a convex set. That the 
set is indeed convex needs a proof since La{P, R) is not convex in its arguments. 

Proposition 22: B{R, r) is a convex set. □ 

Proof: Let P, G B{R, r) for i = 0, 1. For any A G [0, 1], we need to show that Px ^ {1 - \)Po + \Pi e B{R, r). With 
a = 1/(1 + p), and t = sign(p) • 2'"', we get from (|32} that 

I{P^,R)<t, i^O,l. (77) 

The proof will be complete if we can show that I{P\, R) < t. To this end, 

i{Px,R) 

sign(l — a) 



up,) EE^A(x,y)(i?'(x|y))^ 
(1 - X)h{Po)IiPo,R) + Xh{Pi)I{Pi,R) 



HPx) 
Afe(Pi) 

, (1 - A) • sign(l - a)h{Po) + A • sign(l - a)h{Pi) 



(78) 



^ {1 - X)h{Po) + XhjP,) 

HPx) ^'^^ 



h(Px) 

< (80) 

h(Px) 

= t; 

where ( I78> follows from ( I34K ( I79> from ( I77L and ( I80> from the concavity of sign(l — a)h. ■ 
Proposition |22l shows that La {P, R) is a quasiconvex function of P, its first argument. 

When we talk of closed sets, we refer to the usual Euclidean metric on rI^H^L The set of PMFs on X x Y is closed and 
bounded (and therefore compact). 

If f is a closed and convex set of PMFs on X x Y intersecting -B(i?, 00), i.e. there exists a PMF P such that La{P, R) < 00, 
then a PMF Q e £ satisfying 



LaiQ,R) = mmLaiP,R), 



is called the La-projection of R on £. 



Proposition 23: (Existence of La-projection) Let £ be a closed and convex set of PMFs on X x Y. If B{R, 00) n £ is 
nonempty, then R has an ia-projection on £. 

Proof: Pick a sequence P„ G £ with La{Pn,R) < 00 such that La{Pn,R) infpef Lq(P, i?). This sequence being 
in the compact space £ has a cluster point Q and a subsequence converging to Q. We can simply focus on this subsequence 
and therefore assume that Pn ^ Q and La{Pn, R) infpg£ La{P, R)- £ is closed and hence Q G The continuity of the 
logarithm function, wherever it is finite, and the condition Lq(P„, R) < 00 imply that 



\imLa{Pn,R) = -logfsign(/9) •lim/(P„,i?) 

n P \ n 



P 

-log(sign(p)./(Q,i?)) (81) 
P 

La{Q, Li), 
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where j8U follows from the observation that is the ratio of a continuous linear function of P and the continuous concave 
function sign(l — a)h that is bounded, and moreover bounded away from 0. 

From the uniqueness of limits we have that La{Q, R) ~ infpg£ La{P, R). Q is then an -projection of R on £. ■ 

We next state generalizations of [5, Lemma 2.1, Theorem 2.2] which show that La{P, Q) plays the role of squared Euclidean 
distance (analogous to the KuUback-Leibler divergence). 

Proposition 24: Let 0<a<oo,a7^1. 

1) Let La{Q,R) and La{P,R) be finite. The segment joining P and Q does not intersect the Let-sphere B{R,r) with 
radius r — La{Q, R), i.e., 

La{Px,R) > L^{Q,R) 



for each 
if and only if 
2) (Tangent hyperplane) Let 



Px^XP+{l~X)Q, 0<A<1, 

L^iP,R)>L^{P,Q)+L^{Q,R). (82) 
Q = AP+ (1 - A)^, 0<A<1. (83) 



Let La{Q,R), La{P,R), and La{S,R) be finite. The segment joining P and 5* does not intersect B{R,r) (with 
r — La{Q, R)) if and only if 

L^{P,R)=L^{P,Q)+L^{Q,R). (84) 

□ 

Remarks: 1) Under the hypotheses in Proposition 1241 1 . we deduce that La{P,Q) < oo as a consequence. 

2) The condition ( I83> implies that P < X^^Q (i.e., every component satisfies the inequality), and therefore supp(P) C 
supp((5). If < a < 1, and La{Q,R) < oo, then we have supp(P) C supp(Q) C supp(i?). Thus both La{P,R) and 
La{P,Q) are necessarily finite. For a G (0,1), the requirement that La{P, R) be finite can therefore be removed. The 
requirement is however needed for 1 < a < cxd because even though supp(P) C supp((5) and supp(Q) n supp(i?) ^ 0, we 
may have supp(P) n supp(_R) = leading to La{P, R) — oo. 

3) Proposition 12412 extends the analog of Pythagoras theorem, known to hold for the Kullback-Leibler divergence, to the 
family La parameterized by a > 0. 

4) By symmetry between P and S, (I84> holds when P is replaced by S. 



Proof: 1) Since La{P, R) and La{Q, R) are finite, from (|33, we gather that both J2y Y.x ^(^' y)R'ix I vy and 
12y 12x Q(^' y)R'{x I y)^P are finite and nonzero. 

Observe that Pq = Qi and {P\, R) > La (Pq, R) implies that 

I{Px,R)>I{Po,R). 

Thus 

iiP.,R)^m,R) ^ ^ ^^3^ 

A 

for every A £ (0, 1]. The limiting value as A 4 0, the derivative of I{P\, R) with respect to A evaluated at A = 0, should be 
> 0. This will give us the necessary condition. 

Note that the derivative evaluated at A = is a one-sided limit since A G [0, 1]. We will first check that this one-sided limit 
exists. 

From (I33> . I{Px,R) can be written as s(A)/i(A), where t{X) is bounded, positive, and lower bounded away from 0, for 
every A. Let s(0) and i{0) be the derivatives of s and t evaluated at A = 0. Clearly, 

^(0) = lim£(^l-ii^ 

AiO A 

= sign(p)(^^P(x,2/)(P'(x|y))-'' 

\ y X 

-Y^Y.Q{x,y) {R'{x\y))-'\. 

V X / 
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Similarly, it is easy to check that 

y X 

with the possibility that it is +00 (only when < a < 1 and supp(P) ^ supp((5)). 
Since we can write 

1 fs{x) s{oy 



A \t{X) t(0) 

1 



,(0)^(A)-.(0),^(P)^(A)-t(0) 



mm 

it follows that the derivative of s{X)/t{X) exists at A = and is given by (i(O)s(O) - s(0)i(0)) /t^(0), with the possibility 
that it might be +cxd. However, ( I85> and i(0) > imply that 

i(o)-.(o)||>o. 

Consequently, t(0) is necessarily finite. In particular, when < a < 1, we have ascertained that La{P,Q) is finite. After 
substitution of s(0), i(0), s(0), and i{0) we get 



sign(p).^^P(x,y) {R'ixly))-" 

V X 

> sign(p). (^^P(2;,y)(Q'(x|y))-''] 
\ y X I 



y 



h{Q) 



(86) 



When — 1 < p < 0, clearly, '^y^^P{x,y) [Q' {x \ y))~'' cannot be zero, due to the nonzero assumptions on the other 
quantities in ( I86> . This implies that La{P, Q) is finite when 1 < a < 00 as well. An application of (|32} and ( I33> shows that 
(I86> and (I82> are equivalent. This concludes the proof of the forward implication. 

The reader will recognize that the basic idea is quite simple: evaluation of a derivative at A = and a check that it is 
nonnegative. The technical details above ensure that the case when the derivative of the denominator is infinite is carefully 
examined. 

1) The hypotheses imply that La{P, R), La{Q, R), and La{P,Q) are finite. As observed above, (I86> and ( I82> are 
equivalent. Observe that both sides of (18 6> are linear in P. This property will be exploited in the proof. Clearly, if we set 
P = Q in (I82t and (I86> . we have the equalities 



L^{Q,R)^L^{Q,Q)+L^iQ,R) (87) 

and 

sign(p)-^5]Q(a;,y) {R'{x\y))-' 



= sign(p). (^^g(:r,y)(g'(x|y))-'') 
\ y X / 

/E,E.O(^,y) (R'i^l y)y 



KQ) 



(88) 



A A-weighted linear combination of the inequaUties J86> and (I88> yields J86> with P replaced by Pa. The equivalence of (I82t 
and ( I86t result in 

Lc.{Px,R) > P„(PA,Q)+ia(Q,i?) 

> L^{Q.,R). 

This concludes the proof of the first part. 

2) This follows easily from the first statement. For the forward implication, indeed, ( I86> holds for P. Moreover, (I86> holds 
when P is replaced by S. If either of these were a strict inequality, the linear combination of these with the A given by iSll 
will satisfy ( I88> with strict inequality replacing the equality, a contradiction. The reverse implication is straightforward. ■ 
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Let us now apply Proposition 1241 to the Lq, -projection of a convex set. For a convex £, we call Q an algebraic inner point 
of E if for every P G there exist S ^ £ and A satisfying ( I83> . 

Theorem 25: (Projection Theorem) Let < a < oo,a ^ 1 and X a finite set. A PMF Q G £r\B{R, oo) is the Xq, -projection 
of i? on the convex set £ if and only if every P e £ satisfies 

>L„(P,Q)+L„(Q,i?). (89) 

If the i^-projection Q is an algebraic inner point of £, then every P E £ O B{R, oo) satisfies ( I89> with equality. □. 

Proof: This follows easily from Proposition 1241 For the case when La{P,R) — oo not covered by Proposition 1241 ( 18 9> 
holds trivially. ■ 

Corollary 26: Let < a < 1, and a PMF Q G £ fl -B(-R, oo) be the L^-projection of R on the convex set £. If Q is an 
algebraic inner point of £, then every P E £ satisfies ( I89> with equality. 

Proof: Clearly, for any P E £, we have supp(P) C supp((5) C supp(i?), and therefore £ C oo). The corollary 

now follows from the second statement of Theorem |25l ■ 

While existence of ia-projection is guaranteed for certain sets by Proposition |23l the following talks about uniqueness of 
the projection. 

Proposition 27: (Uniqueness of projection) Let < a < oo, a ^ L If the L^-projection of R on the convex set £ exists, it 
is unique. 

Proof: Let Qi and Q2 be the projections. Then 

00 > L„(Qi, i?) = La{Q2, R) > La((92, Ql) + i?), 

where the last inequality follows from Theorem 1251 Thus La{Q2, Qi) — 0, and Q2 — Qi- ■ 
Analogous to the Kullback-Leibler divergence case, our next result is the transitivity property. 

Theorem 28: Let £ and £1 C £ he convex sets of PMFs on X. Let R have La-projection Q on f and Qi on £1, and 
suppose that ( I89> holds with equality for every P E £. Then Qi is the L^-projection of Q on £1. 

Proof: The proof is the same as in [5, Theorem 2.3]. We repeat it here for completeness. 
Observe that from the equality hypothesis applied to Qi G £1 C we have 



L^iQi,R) = L^{Qi,Q) + L^iQ,R). (90) 



Consequently La{Qi,Q) is finite. 
Furthermore, for a P E £1, we have 



Lc.iP,R) 

> L^{P,Qi) + L^{Qi,R) (91) 
= L,(P,Qi) + L„(Qi,Q)+L„(Q,i?), (92) 

where ( I91> follows from Theorem |25l applied to £1, and (|92} follows from ( I90> . 

We next compare ( I92> with La{P, R) — La{P, Q) + La{Q, R) and cancel La{Q, R) to obtain 

L„{P, Q) > L^{P, Oi) + L^{Qi,Q) 

for every P E £i. Theorem 1251 guarantees that Qi is the La-projection of Q on £1. ■ 

As an application of Theorem |25l let us characterize the ia-center of a family. 

Proposition 29: If the -center of a family T of PMFs exists, it lies in the closure of the convex hull of the family. 

Proof Let £ be the closure of the convex hull of T. Let Q* be an La-center of the family, and C, which is at most 
log |X|, the La-radius. Our first goal is to show that Q* E £. 

By Proposition 1231 Q* has an L^-projection Q on £, and by Proposition]^ the projection is unique on £. From Theorem 
I25I for every P G T, we have 

i«(P,Q*) >ia(P,Q) + i«(Q,Q*). 
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Thus 

C = sup L„(P,Q*) 

> supL,(F,Q)+L„(Q,Q*) 

PeT 

> C + L^{Q,Q*). 

Thus L„(Q, Q*) = 0, leading to Q* = Q e 

For the special case when |T| = to is finite, i.e., T = {Pi, ■ • • , P,„}, we found the weight vector w such that Q* 
Si!Li 'w{^)Pi X^ilLi w^(^) = 1- This was done in an explicit fashion in Section IVI-A.21 using results on /-divergences. 



IX. Concluding remarks 

We conclude this paper by applying some of our results to guessing of strings of length n with letters in A. Let X = A", 
TO = |A|, and P a PMF on A. Let 

n 
i=l 

denote the PMF of the discrete memoryless source (DMS) where the n-string = (a;i, a;2, • • ■ , a;„). Theorem |5] says that for 
p = 1, the minimum expected number of guesses grows exponentially with n; the growth rate is given by Hi/2{P)- 

If the only information that the guesser has about the source is that P„ G T, the guesser suffers a penalty (interchangeably 
called redundancy); growth rate of the minimum expected number of guesses is larger than that achievable with knowledge of 
Pn- The increase in growth rate is given by the normalized redundancy i?(P„, G)/n, where G is the guessing strategy chosen 
to work for aU sources in T. This normahzed redundancy equals the normalized ii/2-r^dius of T, i.e., C„/n, where C„ is 
given by (12 1> . to within log(l + nhira). 

When Pn is a DMS, and the PMF P on A is unknown to the guesser, Arikan and Merhav [6] have shown that guessing 
strings in the increasing order of their empirical entropies is a universal strategy. Their universality result is implied by the 
fact that the normalized Li/2-radius of the family of DMSs satisfies C„/n 0. The family of DMSs is thus not rich enough 
from the point of view of guessing. Knowledge of the PMF P is not needed; the universal strategy achieves, asymptotically, 
the minimum growth rate achievable with full knowledge of the source statistics. 

Suppose now that A ~ {0,1}; we may think of an n-string as the outcome of independent coin tosses. Suppose further 
that two biased coins are available. To generate each Xi, one of the two coins is chosen arbitrarily, and tossed. The outcome 
of the toss determines Xi. This is a two-state arbitrarily varying source. We may assume § — {a, 6}. Let us assume that as 
n ^ oo, the fraction of time when the first coin is picked approaches a limit U*{a). Let us further assume that for each n, the 
receiver knows how many times the first coin was picked, i.e., it knows the type of the state sequence. If the two coins are 
not statistically identical, the normalized Pi/2-radius approaches a strictly positive constant as n ^ cxd. This implies that the 
growth rate in the minimum expected number of guesses for a strategy without full knowledge of source statistics is strictly 
larger than that achievable with full knowledge of source statistics. We note that in order to maximize the expected number 
of guesses, the right solution may be to pick one coin, the one with the higher entropy, all the time. 

The guesser's lack of knowledge of the number of times the first coin is picked results in additional redundancy. However 
this additional redundancy asymptotically vanishes. The guesser "stitches" together the best guessing hsts for each type of 
state sequences. 
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